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Abstract. Reputation is generally defined as the opinion of a group on 
an aspect of a thing. This paper presents a reputation model that fol¬ 
lows a probabilistic modelling of opinions based on three main concepts: 
(1) the value of an opinion decays with time, (2) the reputation of the 
opinion source impacts the reliability of the opinion, and (3) the cer¬ 
tainty of the opinion impacts its weight with respect to other opinions. 
Furthermore, the model is flexible with its opinion sources: it may use 
explicit opinions or implicit opinions that can be extracted from agent 
behaviour in domains where explicit opinions are sparse. We illustrate 
the latter with an approach to extract opinions from behavioural in¬ 
formation in the sports domain, focusing on football in particular. One 
of the uses of a reputation model is predicting behaviour. We take up 
the challenge of predicting the behaviour of football teams in football 
matches, which we argue is a very interesting yet difficult approach for 
evaluating the model. 
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1 Introduction 

This paper is concerned with the classic, yet crucial, issue of reputation. We 
propose MORE, the Merged Opinions REputation model, to compute reputation 
on the basis of opinions collected over time. MORE uses a probabilistic modelling 
of reputation; adopts the notion of information decay; considers the reliability 
of an opinion as a function of the reputation of the opinion holder; and assesses 
the weight of an opinion based on its certainty. This latter feature constitutes 
the most novel feature of our algorithm. 

Eurthermore, MORE may be applied to fields with varying abundancy of 
explicit opinions available. In other words, if explicit opinions are available, as 
it is the case with so-called eMarkets, then those opinions may directly be used 
by MORE. In other cases, where such opinions are sparse, behavioural infor¬ 
mation can be translated into opinions that MORE can then use. Eor example, 
if Barcelona beats Real Madrid at football, then this may be translated into 
mutual opinions where Barcelona expresses Real Madrid’s inadequate skills and 
Real Madrid acknowledges Barcelona’s superior skills. This paper also proposes 
an approach for extracting opinions from behavioural information in the sports 
domain. 
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MORE’s calculated reputation measures may then be used for different ob¬ 
jectives, from ranking performance to predicting behaviour and sports results. 

Evaluating reputation is a notoriously tricky task, since there seldom is an 
objective measure to compare to. Eor instance, how can we prove which opinion 
is correct and which is biased? In this paper, we present an extensive valida¬ 
tion effort that has sought to assess MORE’s predictive abilities in the football 
domain, where accurate predictions are notoriously hard to make [4j. 

The rest of this paper is divided as follows: Section presents the MORE 
model. Section [^introduces the necessary approximations. Section [^summarises 
the MORE algorithm; Sections [^[^ and [^ presents our evaluation, before con¬ 
cluding with Section [^ 

2 The MORE Model 

We define the opinion that agent (3 may form about agent a at time t as: o*'p[a) = 
{ei I—>■ ,..., e„ I—>■ Vn}, where G = {a, ,0,... } is a set of agents; t G T and T 

represents calendar time; E = {ei,..., e„} is an ordered evaluation space where 
the terms may account for terms such as bad, good, very good and so on; 
and Vi G [0,1] represents the value assigned to each element Ci G E under the 
condition that |£;|] other words, the opinion is specified as 

a discrete probability distribution over the evaluation space E. We note that the 
opinion one holds with respect to another may change with time, hence various 
instances of o^(a) may exist for the same agents a and (3 but for distinct time 
instants t. 

Now assume that at time t, agent j3 forms an opinion o^p[a) about agent a. 
To be able to properly interpret the opinion, we need to consider how reliable 
P is in giving opinions. We reckon that the overall reliability of any opinion 
is the reliability of the person holding this opinion, which changes along time. 
That is the more reliable an opinion is, the closer its reviewed value is to the 
original one; inversely, the less reliable an opinion is, the closer its reviewed value 
is to the flat (or uniform) probability distribution F, which represents complete 
ignorance and is defined as V Ci G E ■ F(ei) = 1/|E|. This reliability value TZ 
is defined later on in Section |2.3[ However, in this section, we use this value to 
assess the reviewed value Op{a) of the expressed opinion o^(a), which we define 
accordingly: 

OUa)=7^* xo*(a) + (l-7^‘)xF (1) 


2.1 Opinion Decay 

Information loses its value with time. Opinions are no exception, and their in¬ 
tegrity decreases with time as well. Based on the work of jS], we say the value 
of an opinion should tend to ignorance, which may be represented by the flat 
distribution F. In other words, given a distribution O* created at time t', we 
say at time t > t', O* would have decayed to O* = yl(t,F,0* ), where A is the 
decay funetion satisfying the property limt/_).oo O* = F. 
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One possible definition, used by MORE, for A is the following: 

O*'+ (1 - F (2) 

where ly € [0,1] is the decay rate, and: 


if (-('<« 

11 + , otherwise 

At serves the purpose of establishing a minimum grace period during which 
the information does not decay and that once reached the information starts 
decaying. This period of grace is determined by the parameter k, which is also 
used to control the pace of decay. 


2.2 Certainty and its Impact on Group Opinion 

A group opinion on something at some moment is based on the aggregation 
of all the previously-expressed individual opinions. However, the certainty of 
each of these individual opinions has a crucial impact on the aggregation. This 
is a concept that, to our knowledge, has not been used in existing aggregation 
methods for reputation. We say, the more uncertain an opinion is then the smaller 
its effect on the final group opinion is. The maximum uncertainty is defined in 
terms of the flat distribution F. Hence, we define this certainty measure, which 
we refer to as the opinion’s value of information, as follows: 


X(0^(a))=H(0;,(a))-H(F) (4) 

where, % represents the entropy of a probability distribution, or the value 
of information of a probability distribution. In other words, the certainty of an 
opinion is the difference in entropies of the opinion and the fiat distribution. 

Then, when computing the group opinion, we say that any agent can give 
opinions about another at different moments in time. We define Tp(a) C T to 
describe the set of time points at which (3 has given opinions about a. The group 
opinion about a at time t, ©^(a), is then calculated as follows: 


0^(a) 


/3eGt'eT^(a) 

l3eGt'eTfj(a) 


( 5 ) 


This equation states that the group opinion is an aggregation of all the de¬ 
cayed individual opinions that represent the view of every agent f3 that 

has expressed an opinion about a at some point t' in the past. However, different 
views are given different weights, depending on the value of their information 

Note that in the proposed approach, one’s latest opinion does not override 
previous opinions. This choice to override previous opinions or not is definitely 



4 


context dependent. For example, consider one providing an opinion about a cer¬ 
tain product on the market, then changing his opinion after using the product 
for some time. In such a case, only the latest opinion should be considered and 
it should override the previous opinion. However, in our experiments, we use 
the sports domain, where winning football matches are interpreted as opinions 
formed by the teams about each others strength in football. In such a case, 
the opinions obtained from the latest match’s score should not override opin¬ 
ions obtained from previous matches. In such a context, past opinions resulting 
from previous matches will still need to be considered when assessing a team’s 
reputation. 

Finally, we note that initially, at time to, we have \/a G G ■ ©^(a) = F. In 
other words, in the absence of any information, the group opinion is equivalent to 
the flat distribution accounting for maximum ignorance. As individual opinions 
are expressed, the group opinion starts changing following Equation 

2.3 Reliability and Reputation 

An essential point in evaluating the opinions held by someone is considering 
how reliable they are. This is used in the interpretation of the opinions issued 
by agents (Equation [^. The idea behind the notion of reliability is very simple. 
A person who is considered very good at solving a certain task, i.e. has a high 
reputation with respect to that task, is usually considered an expert in assessing 
issues related to that task. This is a kind of ex-cathedra argument. An example 
of current practice supported by this argument is the selection of members of 
committees or advisory boards. 

But how is reputation calculated? First, given an evaluation space E, it is easy 
to see what could be the best opinion about someone: the ‘ideal’ distribution, 
or the ‘target’, which is defined as T = {e„ i—>■ 1}, where e„ is the top term in 
the evaluation space. Then, the reputation of /3 within a group G at time t may 
be defined as the distance between the current aggregated opinion of the group 
Oq(/ 3) and the ideal distribution T, as follows: 

TZp = l-emd{OaW),V (6) 

where emd is the earth movers distance that measures the distance between 
two probability distributions [7] (although other distance measurements may 
also be used). The range of the emd function is [0,1], where 0 represents the 
minimum distance (i.e. both distributions are identical) and 1 represents the 
maximum distance possible between the two distributions. 

As time passes and opinions are formed, the reputation measure evolves along 
with the group opinion. Furthermore, at any moment in time, the measure TZ* 
can be used to rank the different agents as well as assess their reliability. 

3 Necessary Approximation 

As Equation illustrates, the group opinion is calculated by aggregating the 
decayed individual opinions and normalising the final aggregated distribution by 
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considering the value of the information of each decayed opinion [a))). 

This approach imposes severe efficiency constraints as it demands exceptional 
computing power: each time the group opinion needs to be calculated, all past 
opinions need to decay to the time of the request, and the value of the information 
of these decayed opinions should be recomputed. 

We suggest an approximation to Equation that allows us to apply the 
algorithm over a much longer history of opinions. To achieve this, when a group 
opinion is requested, its value is calculated by obtaining the latest group opinion 
and decaying it accordingly. In other words, we assume the group opinion to 
decay just like any other source of information. Instead of recalculating them over 
and over again, we simply decay the latest calculated value following Equation]^ 
as follows: 


0^(a) = 0S(a) + (1 - F 


When a new opinion is added, the new group opinion is then updated by 
adding the new opinion to the decayed group opinion. In this case, normalisation 
is still achieved by considering the value of the information of the opinions being 
aggregated; however, it also considers the number of opinions used to calculate 
the latest group opinion. This is because one new opinion should not have the 
exact weight as all the previous opinions combined. In other words, more weight 
should be given to the group opinion, and this weight should be based on the 
number of individual opinions contributing to that group opinion. As such, when 
a new opinion Oo(a) is added. Equation^ is replaced with Equation pH 


0^(a) 


■ I(0‘G^*(a)) + 0^(a) • 1(0^ (a)) 
n„X(0^^‘(a))+I(0^(a)) 


(7) 


where Ua represents the number of opinions used to calculate the group 
opinion about a. 

Of course, this approach provides an approximation that is not equivalent to 
the exact group opinion calculated following Equation This is mainly because 
the chosen decay function (Equation]^ is not a linear function since the decay 
parameter v is raised to the exponent of Aj, which is time dependent. In other 
words, decaying the group opinion as a whole results in a different probability 
distribution than decaying all the individual opinions separately and aggregating 
the results following Equation Hence, there is a need to know how close is the 
approximate group opinion to the exact one. In what follows, we introduce the 
test used for comparing the two, along with the results of this test. 


3.1 The Approximation Test 

To test the proposed approximation, we generate a number of random opinions 
O^.(a) over a number of years, where a is fixed, Pi is an irrelevant variable 
(although we do count the number of opinion sources every year, the identity of 
the source itself is irrelevant in this specific experiment), and t varies according to 
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the constraints set by each experiment. For example, if 4 opinions were generated 
every year for a period of 15 years, then the following is the set of opinion sets 
that will be generated over the years: 

With every generated opinion, the group opinion is calculated following both 
the exact model (Equation and the approximate model (Equation . We 
then plot the distance between the exact group opinion and the approximate 
one. The distance between those two distributions is calculated using the earth 
mover’s distance method outlined earlier. We note that a good approximation is 
an approximation where the earth mover’s distance (EMD) is close to 0. 

Two different experiments were executed. In the first, 10 opinions were being 
generated every year over a period of 6 years. In the second, 4 opinions where 
being generated every year over a period of 15 years. Each of these experiments 
were repeated several times to test a variety of decay parameters. The final 
results of these experiments are presented in the following section. 


3.2 Results of the Approximation Test 

Figure [^presents the results of the first experiment introduced above. The results 
show that the approximation error increases to around 11% in the first few 
rounds, and after 12 opinions have been introduced. The approximation error 
then starts to decrease steadily until it reaches 0.3% when 60 opinions have been 
added. Experiment 2 has the exact same results, although spanning over 15 years 
instead of 6. For this reason, as well as well as lack of space, we do not present 
the second experiment’s results here. However, we point that both experiments 
illustrate that it is the number of opinions that affect the increase/decrease in 
the EMD distance, rather than the number of years and the decay parameters. 
In fact, undocumented results illustrate that the results of Eigure provide a 
good estimate of the worst case scenarios, since the earth mover’s distance does 
not grow much larger for smaller v values, but starts decreasing towards 0. When 
V = 0 and the decay is maximal (i.e. opinions decay to the flat distribution at 
every timestep), the EMD distance is 0. However, when the decay is minimal 
(i.e. opinions never decay), then the results are very close to the case of ^ = 0.98 
and K = 5. 



Fig. 1. Distance between the exact and approximate Oq 
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We conclude that the larger the available number of opinions, then the more 
precise the approximation is. This makes this approximation suitable for appli¬ 
cations where more and more opinions are available. 

4 The MORE Algorithm 

The merged opinions reputation model, MORE, is implemented using the ap¬ 
proximation of Section and formalised by Algorithm 


Algorithm 1 The MORE Algorithm 

Require: E = {ei,..., e-a} to be an evaluation space 

Require: G = {a, /3,... } to be a group of agents 

Require: t G N to be a point in time 

Require: ODB to describe the database of all opinions 

Require: (a) ^ 0 ^( 7 ) = {T, if t' < f; T, otherwise} 

Require: = (Ox(Q!) — -|- F, where u € [0,1] is the decay pa¬ 

rameter and /« € N is the pace of decay 

Require: emd : x —>■ [0,1] to represent the earth mover’s distance function 

that calculates the distance between two probability distributions 
M Ci £ E ■ F(ei) = 1/n 

Vei£E ■ (i < |A| ^ T(ei) = 0) A (i = \E\ => T(ei) = 1) 

’H(F) = -log(l/n) 

Va G G • 0^“(a) =F 
y a £ G ■ Ha = 0 

while 3 0 ^( 0 ) G ODB • (Vo G ODB • o^p{a) < o) do 
7^^ = 1 - emd(0*G^‘(/3),T) 

0 ^( 0 ) = 77.^ X 0 ^( 0 ) + (1 — 77.^) X F 

2:(0^(a)) = - ^ O^(a)(e0 • logO^(o)(e0 - H(F) 

ei€E 

X(O‘G^*(a)) = -^O*G^‘(a)(e 0 • logO*G^*(Q)(ei) - H{¥) 

eiSE 

+ X(0},(c)) 

TZa = 1 — emd{0)Q{a), T) 

na=na + l 

end while 


In summary, the algorithm is called with a predefined set of opinions, or the 
opinions database ODB. For each opinion in ODB, the reviewed value is calculated 
following Equation [ 1 ] the informational value of the opinion as well as that of the 
decayed latest group opinion are calculated following Equation the updated 
group opinion is then calculated following Equation and the reputation of the 
agent is calculated via Equation These steps are repeated for all opinions in 
ODB in an ascending order of time, starting from the earliest given opinion and 
moving towards the latest given opinion. 






We note that the complexity of this algorithm is constant (0(1)). Whereas 
if we were using Equation as opposed to the proposed approximation, then 
the complexity would have been linear w.r.t. the number of opinions n (0(n)). 
For very large datasets, such as those used in the experiment of Section the 
approximation does provide a great advantage. 

5 From Raw Scores to Opinions 

This section describes the extraction of opinions from behavioural information. 
While we focus on football, we note that these methods may easily be applied to 
other domains. We say the possible outcomes of a match between teams a and 
(3 are as follows: (i) a wins, (ii) a loses, or (Hi) the match ends up in a draw. 
We denote as ng{a) (resp., ng{f3)) the number of goals scored by a (resp., /3). 
We then define three methods to convert match results into opinions. Generated 
opinions belongs to a binary evaluation space consisting of two outcomes, namely 
bad (B) and good (G): E = {B,G}. 

5.1 The Naive Conversion 

In this first strategy, we simply look for the winner. If a wins, then it receives 
an opinion from /? equal to Op{a) = {B i—0, Gi—)■ 1}, and /3 will get an opinion 
from a equal to o^(/3) = {Bi-^ IjGi-^-O}. In case of a draw, they both get the 
same opinion: o^(a) = o^(/3) = 0.5, Gi-J-0.5}. The method is quite simple 

and it does not take into account important aspects such as the final score of 
the match. For instance, losing 0 to 3 is equivalent to losing 2 to 3. 

5.2 Margin-of-Victory Conversion 

A second strategy we consider is called Margin of Victory - MV. The margin of 
victory of a match involving clubs a and (3 is defined as the difference of goals 
M = ng{a) — ng{f3) scored by a and (3. Of course M > 0 if a wins. The main 
idea here is this: if we know a beats /3, this tells us something about the relative 
strength of a against (3. If we know a scored more than 3 goals against /3 (which 
is rather unusual in many professional leagues), we could probably have a better 
picture of the relative strength of the two clubs. We believe that including more 
data in the process of generating opinions should produce more accurate results 
and, ultimately, this should help us in better predicting the outcome of a football 
match. The rules we used to include the number of goals scored by each club are 
as follows: 


.(/?) 


'{Sh^0.5,Gh^0.5} 


B^ 


ng{a) 

ng(a)+ng(P) ’ 




ngjp) 

ng(a)+ng(0) 


, for a 0-0 tie 
, otherwise 


( 8 ) 


In analogous fashion we can compute the opinion of /3 on a. Equation]^ tells 
us that if the margin of victory ng{a) — ng{/3) is large, then ng(a) is higher than 
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ng{l3) and the ratio ng(a)^ng{i 3 ) closer to 1. As a consequence, the larger 

the margin of victory between a and /?, the more likely a will get an evaluation 
biased towards good. In case of a 0-0 tie, the terms — "9 (q) — "3(^) 

are undefined. To manage such a configuration, we assume that the probability 
that a (resp., (3) gets the evaluation good is equal to the probability it gets the 
evaluation bad. 

A potential drawback of the MV strategy is that different scores may be 
translated into the same distribution. This happens every time one of the clubs 
does not score any goal. For instance, the winners in two matches that end with 
the scores 1 — 0 and 4 — 0 would received an opinion {i3i->0, 1}, as calculated 

by the MV strategy. 

5.3 Gifted Margin of Victory 

The third strategy we propose is called the Gifted Margin of Victory - GMV. 
It has been designed to efficiently handle the case of football matches in which 
one of the clubs does not score any goal. The GMV strategy computes opinions 
accordingly: 

^ } ( 9 ) 

3 iQ) — f P| ' ‘ngl/Sj+X ^ , ng(a)+X } 

^a(PJ — ng(a)+ng(^)+2X’^^ ng(a)+ng(/3)+2X j 

In other words, we give as a gift both clubs with a bonus of V > 0 goals in 
order to manage all matches in which one (or possibly both) of the two clubs 
does not score any goal. Here X is any positive real number. If AT —)■ 0, then the 
GMV strategy would collapse to the MV strategy. On the other hand, if X is 
extremely large then the constant X would dominate over both ng{a) and ng{j3) 
and the terms and would converge to 0.5. This 

result is potentially negative because the probability that any team is evaluated 
as good is substantially equivalent to the probability that it is evaluated as bad 
and, therefore, all the opinions would be intrinsically uncertain. An experimental 
analysis was carried out to identify the value of X guaranteeing the highest 
prediction accuracy. Due to space limitations we omit the discussion on the 
experimental tuning of the X parameter and we suffice with the results of our 
experiment that show that the best value found for X was 1. 

A further improvement of the GMV strategy comes from normalization. Nor¬ 
malization is motivated by the observation that, since Af > 0, term ng{a) X 
(resp., ng{f3) + X) is strictly less than ng{a) + ng{f3) + 2X. Hence, a (resp., 
/?) will never get an opinion where the probability of good comes close to 1, 
even if it has scored much more goals than j3 (resp., a). At the same time, 
since ng{a) -I- V > 0, there is no chance that a will get an opinion where the 
probability of bad is close to 0. 

Let Pa^^{G) be the probability of a being evaluated good by (3, according 
to the GMV strategy. We then normalize (G) to the [0,1] range by con¬ 

sidering, for a given set S of teams, the highest and lowest probabilities of being 
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evaluated good according to the calculations of the GMV strategy, which we de¬ 
note as M{S) and rn{S), respectively. We then define the normalized probability 
Pg{(^) of team a being evaluated good by /3 as follows: 


. , , Pa, -TO(5) 

M(S)-m(S) 


( 11 ) 


And the probability of team a being evaluation bad by j3 becomes: psict) = 

l-pG{a). 


6 From Reputation to Predictions 

This section illustrates how we can use MORE to predict the outcome of a 
football match. We note that a football match may be depicted as an ordered 
pair (q!,/3), where a and /3 are opponent clubs. We will follow this convention: 
we will let a be the ‘home club’ whereas /3 will be the ‘visiting club’. To compute 
the reputation of teams a and l3, we define the relative strength of a w.r.t. /3 at 
time t as follows: 


In what follows, and for simplification, we omit the reference to time t and we 
use the simplified notation ra,p- Notice that 0 < ra ,/3 < 1 and the higher (resp., 
lower) ra,p is, the stronger (resp., weaker) the club a is at playing and winning 
a football match. We shall adopt the following rules to predict the outcome of a 
matchl3 

1. If ^ then the winner will be a. 

2. If Va^p « then the match will end up in a draw. 

3. If Vot^p ^ then the winner will be j3. 

7 Experimental Results 

In this section, we test the effectiveness of our approach. In detail, we designed 
our experiments to answer the following questions: 

Qi. What is the accuracy of the MORE algorithm in correctly predicting the 
outcome of a football match? 

Q 2 . Which score-to-opinion strategy is reliably the most accurate? 

Q 3 . To what extent does information decay impact the accuracy of MORE? 

^ We note that we look for values that are approximately greater (^), approximately 
less than (^), or approximately equal («) to In practice, this is achieved by 
defining three different intervals to describe this. 
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7.1 Datasets and Experimental Procedure 

To answer questions Qi-Qa, we ran several experiments, drawn on a large 
dataset of match scores that we collected from public sources]^ Our dataset 
contains the complete scores of several seasons of the Spanish Primera Division 
(Liga), the top football league in Spain. At the moment of writing, 20 clubs play 
in the Liga. Each club plays every other club twice, once at home and once when 
visiting the other club. Points are assigned according to the 3/1/0 schema: 3 for 
win, 1 for draw and 0 for loss. Clubs are ranked by the total number of points 
they accumulate and the highest-ranked club at the end of the season is crowned 
champion. The dataset consists of 8182 matches from the 1928-29 season until 
the 2011-12 season. Overall, the home club won 3920 times and lost 2043 times, 
and the number of ties amounted to 2119. 

For the football domain, a major goal of our experimental tests was to check 
MORE’s predicting accuracy. For each match in our database involving clubs 
a (home club) and /3 (visiting club) we separately applied the Naive, MV and 
GMV strategies to convert the outcomes of a football match into opinions. We 
then applied the MORE algorithm and computed the relative strength of 
a against /3. We tried various configurations of the decay parameter v in order 
to study how the tuning of this parameter influences the overall predictive per¬ 
formance of MORE. The usual 3/1/0 scoring system for football rankings (and 
other games) provided us with a baseline to study the predictive accuracy of 
MORE. 

The experimental procedure we followed to compare the predictive accu¬ 
racy of MORE and 3/1/0 was as follows. We partitioned the dataset containing 
football matches into 10 intervals, Ii,l 2 , • ■ • ) 2 ii 0 j on the basis of the relative 
strength of opponent clubs. In detail, for an arbitrary pair of clubs a and /3, the 
first interval Ii was formed by the matches such that 0 < < 0.1, the second 

interval I 2 contained the matches for which 0.1 < ,g < 0.2 and so on until the 
tenth interval Iio (consisting of the matches in which 0.9 < Tq,,/? < 1). Observe 
that the intervals may have different sizes (because, for instance, the number of 
matches in Ii could differ from those in 12 )- Given an interval Ik, we have that 
the larger k, the better the skills of a are and, then, the more likely a should be 
able to beat (3. 

For different strategies and parameter settings, we computed the percent¬ 
age of times (Ff/(fc)) that MORE accurately predicted the outcome of matches 
in the Ik interval that ended with the victory of the home club. Accordingly, 
we refer to Fnik) as the home success frequency. In an analogous fashion, we 
computed the percentage of times (F 4 (fc)) that MORE accurately predicted the 
outcome of matches in the Ik interval that ended with the victory of the visiting 
club. Accordingly, we refer to FA{k) as the visiting success frequency. We would 
expected that the higher r^^is the higher Fnik). In fact, as —)■ 1 MORE be¬ 

comes more and more confident on the ability of a of beating /3 and, therefore, 
we expect that Fnik) is consequently large. The situation for Tq.,/? is similar: 

Data were extracted from http://www.lfp.es/LigaBBVA/Liga_BBVA_Resultados. 

aspx 
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its increase corresponds to a decrease of rp^a and, therefore, an increase of 
should correspond to a decrease in the frequency of (home club) a wins. 

In the following, when it does not generate confusion, we shall use the sim¬ 
plified notation Fh (resp., Fa) in place of Fnik) (resp., F^(fc)) because, for a 
fixed match {a, j3) we can immediately identify the interval Ife to which (a, j3) 
belongs to and, therefore, the Fnik) (resp., FA{k)) becomes redundant. 

7.2 Assessing the Quality of Predictions 

The hrst series of experiments we performed aimed at assessing the accuracy of 
the predictions with respect to the different strategies. The results are plotted in 
Figures [2}|^ for the Naive, MV, and GMV strategies, respectively. In each figure, 
the plot on the left represents the frequency of successful predictions for the 
home team winning, and that on the right represents the frequency of successful 
predictions for the visiting team winning. 

From the analysis of these results we can draw some relevant conclusions. The 
Naive strategy, despite its simplicity and independence from the final outcome 



Fig. 2. Naive strategy: success frequencies for Fh and Fa (resp.) over relative strength. 




Fig. 3. MV strategy: success frequencies for Fh and Fa (resp.) over relative strength. 



Fig. 4. GMV strategy: success frequencies for Fh and Fa (resp.) over relative strength. 
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of the match, is able to generate accurate predictions. In fact, Naive is ofter 
a better forecaster than ranking-based prediction (i.e., using the 3/1/0 point 
system). For home victories, the maximum value of F/f is around 66% whereas 
the 3/1/0 algorithm peaks at around 59%. For away victories, the values of 
Fa range between 25% and 50% whereas the 3/1/0 algorithm has its success 
frequency flat around 0.3. 

It is also interesting to observe that the decay factor i/ has little impact on the 
values of both Fjj and Fa- In particular, the peak value of F^ is obtained when 
V = 0.7 but the value v = 0.5 provides more stable results. In contrast, setting 
v = 0.5 is the best option for visiting victories, even if the curves describing the 
evolution of Fa tend to coincide when the relative strength (depicted as rg in 
Figures [2jQ is greater than 0.5. 

Let us now consider the MV strategy, whose results are reported in Fig¬ 
ure 1^ This second experiment provides evidence of an increase in the accuracy 
of MORE, as the highest value of Fh is now equal to 64% and the highest value 
of Fa is equal to 46%. This suggests that including the number of goals scored 
by each team in the process of generating opinions is effective in better com¬ 
puting the strength of each club and, ultimately, in producing more accurate 
predictions. From these figures we can also conclude that for both home and 
visiting victories, Fh and Fa achieve their peak when ly = 0.6. But the trends of 
the curves depicted in Figure]^ are quite similar. This implies that information 
decay has little impact when the MV strategy is chosen. 

Finally, we consider the GMV strategy. Once again, we computed Fh and Fa 
for different values of v and the corresponding results are graphically reported 
in Figure]^ 

This last experiment illustrates that the GMV strategy (with X = 1) provides 
the highest values of Fh and Fa- The best value of Fh is around 78% (while 
Fh associated with the 3/1/0 algorithm does not exceed 59%). Analogously, in 
case of visiting victories, the best value of Fa is equal to 68% (while the 3/1/0 
algorithm is not able to go beyond 37%). 

The value of ly providing the peak values of Fh and Fa was 0.6 even though 
the information decay has little impact, as in the case of MV strategy. 

We conclude this section by observing that when is less than 0.3, the 
value of Fh is around 0.5, independently of the adopted strategy. This result is 
clearly superior to a merely guess-and-check strategy, where choices are chosen 
uniformly at random and the probability of guessing the correct result is | (as 
there are three possible outcomes: a winning, /3 winning, or neither - having a 
draw). 


8 Conclusion 

This paper proposed a reputation model based on a probabilistic modelling of 
opinions, a notion of information decay, an understanding that the reputation 
of an opinion holder provides an insight on how reliable his/her opinions are, 
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as well as an understanding that the more certain an opinion is, the more its 
weight, or impact. 

An interesting aspect of this model is that it may be used in domains rich 
with explicit opinions, as well as in domains where explicit opinions are sparse. 
In the latter case, implicit opinions are extracted from the behavioural infor¬ 
mation. This paper has also proposed an approach for extracting opinions from 
behavioural information in the sports domain, focusing on football in particular. 

In the literature, several ranking algorithms exist that are also based on the 
notion of implicit opinions. For instance, PageRank [T] and HITS [5] compute 
the reputation of entities based on the links between these entities. Indirectly, 
their approach assumes that a link describes a positive opinion: one links to the 
“good” entities. Both have been applied successfully in the context of web search. 
In HU, ranking algorithms like PageRank and HITS were applied to the social 
network to find experts in the network based on who is replying to the posts 
of whom. In [5], HITS has been used in a similar manner to help find experts 
based on who is replying to the emails of whom. EigenTrust [S] calculates the 
reputation of peers in P2P networks by relying on the number of downloads that 
one peer downloads files from another. In |3] , a personalised version of PageRank 
that also relies on the download history is used to find trustworthy peers in P2P 
networks. Also, CiteRank m and SARA [5] are algorithms that rank research 
work by interpreting a citation as a positive opinion about the cited work. 

In comparison, we note that MORE is more generic than existing ranking 
algorithms, since it has the power to incorporate both explicit and implicit opin¬ 
ions in one system. Although built upon previous work, MORE also introduces 
the novel idea of considering the certainty of an opinion as a measure of its 
weight, or impact, when aggregating the group members’ opinions. Finally, the 
model is validated by evaluating its performance in predicting the scores of foot¬ 
ball matches. We consider the football league scenario particularly interesting 
because it describes well the opportunities and limitations of the mechanisms by 
which we would like to evaluate reputation, and thus estimate the true strength of 
agents in general. Furthermore, we note that unlike the sophisticated predictive 
models in use today, (e.g., Goldman Sachs’ model that was used for World Cup 
2014, and relied on around a dozen statistical/historical parameters), MORE 
relies solely on game scores. In other words, it requires no tuning of complex 
parameters, and yet its predictions are reasonably accurate. 
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